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Abstract 

In this paper, we use semi-definite programming and generalized principal component 
analysis (GPCA) to distinguish between two or more different facial expressions. In the 
first step, semi-definite programming is used to reduce the dimension of the image data and 
"unfold" the manifold which the data points (corresponding to facial expressions) reside on. 
Next, GPCA is used to fit a series of subspaces to the data points and associate each data 
point with a subspace. Data points that belong to the same subspace are claimed to belong 
to the same facial expression category. An example is provided. 



1. Mathematical Preliminaries 



In this section, we introduce notation, several definitions, and some key results in abstract 
algebra and algebraic geometry [HOEJH] that are necessary for developing the main results 
of this paper. Specifically, for A G M. nxn we write A > (resp., A > 0) to indicate that A is 
a nonnegative-definite (resp., positive definite) matrix. In addition, (-) T denotes transpose, 
and (-)> denotes the Moore- Penrose generalized inverse. In the next paragraphs we give the 
definitions for ideal and the Veronese map. 

Definition 1.1 (Ideal). Let 1Z be a commutative ring and I be an additive subgroup of 
TZ. I is called an ideal if r G TZ and s & I then rs G I. Furthermore, an ideal is said to be 
generated by a set S, if for all t G I, t — Y^i=\ r i s i> i"i E TZ, Si E S, i = 1,2, .. .n for some 
n G N. 

Let R[x) be the set of polynomials of D variables, where x = [xi X2 ■ ■ ■ %d} T , %i G R, 
i = 1,2, ... D, and R is a field. Then R[x] with the polynomial addition and multiplication 
is a commutative ring. A product of n variables x\, X2, ■ ■ ■ x n is called a monomial of degree 
n (counting repeats). The number of distinct monomials of degree n is given by 

m„(o^( b+ ;- 1 ). (i) 

Definition 1.2 (Veronese Map) [I]. The Veronese Map of degree n, v n : R D — > R M ™( D \ 
is a mapping that assigns to D variables X\, X2, ■ ■ ■ xp, all the possible monomials of degree 
n, namely, 

u([xi x 2 ... x D ] T ) = [u 1 u 2 ... u Mn{D) ] T 

such that Ui = x\ iY x\ a . . . x^ D , i = 1, 2, . . . M n (D), where nn + n i2 + ■ — h nm = n, riij G N, 
j = 1, 2, ... D, and M n (D) is given by CQ). 

2. Dimension Reduction 

In this section, we introduce a method known as Maximum Variance Unfolding (MVU), 
a dimension reduction technique which uses semi-definite programming. Given a set of 
points sampled from a low dimensional manifold in a high dimensional ambient space, this 
technique "unfolds" the manifold (and hence, the points it contains) while preserving the 
local geometrical properties of the manifold [5]. This method, in a sense, can be regarded as a 
nonlinear generalization for the Principal Component Analysis (PCA). Given a set of points 
in a high dimensional ambient space, PCA identifies a low dimensional subspace such that 
the variance of the projection of the points on this subspace is maximized. More specifically, 
the base of a subspace on which the projection of the points has the maximum variance is 
the eigenvectors corresponding to the non-zero eigenvalues of the covariance matrix pj. In 
the case where data is noisy, the singular vectors corresponding to the dominant singular 
values of the covariance matrix are selected [1]. 

Given iV input points {x n }^ =l G ~R D , we would like to find TV output points {y n }n=i e ^ d 
such that d < D, there is a one-to-one correspondence between these sets, and points close 
to each other in the input data set remain close in the output data set. In order to be 
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more precise, we need to introduce the concept of isometry for a set of points [HE]. Loosely 
speaking, isometry is an invertible smooth mapping defined on a manifold such that it locally 
has a translation and rotation effect. The next definition extends the notion of isometry to 
data sets. 

Definition 2.1 [5J. Let X = {x n }^ =1 G R D and Y = {y n }n=i e Rd be two sets of P oint 
that are in one-to-one correspondence. Then X and Y are fc-locally isometric if there exists 
a mapping consisting of rotation and translation T : M D — > M. d such that if T(x n ) = y n then 
T(N Xn (k)) = Ny n (k), for n = 1,2, ...n, where N x (k) is the set of fc-nearest neighbors of 
iGl. 

Before stating the MVU method, we give the problem statement. 

Problem 2.1. Given a set of input data points X = {x n }^ =1 G M. D find the output data 
points Y = {y n }n=i G d < D, such that the sum of pairwise square distances between 
outputs, namely, 

N N 

i=i j=i 

is maximized and X and Y are ^-locally isometric for some fceE Without loss of generality, 
we assume that J2n=i x n — 0- Moreover, we require X^=i 2/n = to remove the translational 
degree of freedom of the output points Y. 

Note that the data set can be represented by a weighted graph G, where each node repre- 
sents a point and the /^-nearest points are connected by edges where k is a given parameter. 
The weights also represent the distance between the nodes. We, furthermore, assume that 
the corresponding graph G is connected. In case of a disconnected graph, each connected 
component should be analyzed separately. The &-local isometry condition in Problem 12.11 
requires that the distances and the angles between the fc-nearest neighbors to be preserved. 
This constraint is equivalent to merely preserving the distances between neighboring points 
in a modified graph G', where in G' for each node, all the neighboring nodes are also con- 
nected by an edge. More precisely, in G', each node and the fc-neighboring nodes form a 
clique of size k + 1 (See Figure I2TT1) . 

The next theorem gives the solution to Problem 12. II for the case d = D. 

Theorem 2.1 [5]. Consider the problem given by Problem 12.11 and assume d = D. The 
output data points Y = {y n }n=i e ^ D are given by the solution to the optimization problem 

max$, (3) 

subject to 

N 

Z> = °> (4) 

71=1 

hi -Vj\\ 2 = D ij, if% = l, i, j = 1,2, . . . N, (5) 

where $ is defined in (J2J), r] — [rjij] G R 7Vx7V is the adjacency matrix of the modified graph 
G', and = \\x{ — Xj\\ 2 , i, j = 1, 2, . . . N, x iy Xj G X. 
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Figure 2.1: The original and modified graphs for k = 2 



The optimization problem (J3j) — (J5I) is not convex. The following convex optimization 
problem, however, is equivalent to the optimization problem given in Theorem 12. 1[ Moreover, 
this theorem also addresses the case where d < D. 

Theorem 2.2 [5]. Consider the problem given by Problem 12.11 and assume that d = D. 
The output data points Y = {y n }^ =1 G M D are given by the solution to the optimization 
problem 



maxtr(-RT), (6) 



subject to 



K > 0, (7) 

N N 
i=l j=l 

K u - 2K {j + Kjj = Dy, if Vij = 1, ij = 1,2,... N, (9) 

where K = [kij] is the inner product matrix defined by fcy = yjyj, i,j = 1,2,... N, and rj 
and Dij are defined in Theorem 12.11 Moreover, 



V 



ni 



VlnVni, t = l,2,...D, n = l,2,...N, (10) 



where V n = [V n iV n 2 . . . V n £)} T , n = 1,2, ... N, is the eigenvector of K, l n is its associated 
eigenvalue, and y n = [y n i y n2 ■ ■ ■ y n D] T - Furthermore, if K has d non-zero eigenvalues, then 
the output data points given by {^ cduced }^ = i € IR d can be found by removing the zero 
elements in y n . 



Remark 2.1. When data is noisy, usually all the eigenvalues of K are non-zero, and 
therefore, one has to choose the dominant eigenvalues (see [1] for some techniques for choosing 
the dominant eigenvalues). If the eigenvalues of K are sorted in the descending order, the 
first d elements of y n , n = 1,2,...N, is a d-dimensional data set that is approximately 
fc-locally isometric to {x n }^ =1 G M D . 
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3. Data Segmentation and Subspace Identification 



In this section, we address the problem of data segmentation and subspace identification 
for a set of given data points. Next, we define the multiple subspace segmentation problem. 



Problem 3.1 (Multiple Subspace Segmentation Problem). Given the set Y = {y%\f =1 
which are drawn from a set of distinct subspaces of unknown number and dimension, we 
would like to (i) find the number of subspaces, (ii) find their dimensions, (Hi) find the basis 
for each subspace, and (iv) associate each point to the set it belongs to. 

Generalized Principal Component Analysis (GPCA) uses algebraic geometric concepts 
to address this problem. First, we present the basic GPCA algorithm and later introduce 
the version of GPCA which is more robust to noise. For a detailed treatment of the subject 
see [1]. 



3.1. Basic GPCA 



In this section we present the Basic GPCA algorithm, where we assume that data points 
are noise-free. The GPCA algorithm consists of three main steps: (i) polynomial fitting, 
(ii) polynomial differentiation, and (in) polynomial division. Let A = {S%, S2, ■ ■ ■ S n } be 
a subspace arrangement and = S\ U S 2 U ■ • • U S n , where dim(Sj) = dj, j = 1, 2, ... n. 
Furthermore, let Y = {yi}f =l be a set of sufficiently large number of points sampled from Z4. 
In this paper, we assume that the number of subspaces n is known. The GPCA algorithm, 
however, gives the solution for the case where n is unknown (see [1]). In order to algebraically 
represent Z4, we need to find the vanishing ideal of namely I(Z^). The vanishing ideal is 
the set of polynomials which vanish on Z4. It can be shown that the homogenous component 
of I(Z_a), namely I n , uniquely determines 1(Z^). Therefore, in order to find the vanishing 
ideal I(ZX) it suffices to determine the homogenous component I n . 

Now note that if p n (x) is a polynomial in I n then p n (x) = c^u n (x), c n G M. Mn ' D \ where 
x = \x\ x 2 ... xd] t , for some D e N, and M n (D) is given by (JTJ). Therefore, each point 
i = 1,2, . . . N, should satisfy p n (x), hence V n (D)c n = 0, where 



V n (D) 



A 



vI{Vn) 



(11) 



is called the embedded data matrix. A one-to-one correspondence between the null space of 
V n (D) and the polynomials in I n exists if the following condition holds 



or equivalently, 



dim (J\f(V n (D))) = dim(/ n ) = h^n) 



Temk(V n (D)) = M n (D)-h 1 (n) 



(12) 



(13) 



where hi(n) is the Hilbert function. The singular vectors of V n (D) represented by c ni , 
i = 1, 2, . . . hi(n) corresponding to the zero singular values of V n (D) can be used to compute 
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a basis for I n , namely 

I n = span{p ni (x) = c ni v n {x), i = 1,2,... hi(n)}. 

In the case where the data Y is corrupted by noise, the singular vectors corresponding to 
the hi(n) smallest singular values of V n (D) can be used. 

The following theorem shows how polynomial differentiation can be used to find the 
dimensions and bases of each subspace. 

Theorem 3.1 [I]. Let Y = {yi}f =1 be a set of points sampled from Zj± = SiUS^U- • -US n , 
where Si is a subspace of unknown dimension di, i = 1,2, . . . n. Furthermore, assume that 
for each subspace Sj, j = 1, 2, ... n, a point Wj is given such that wj G Sj, Wj ^ Si, i ^ j, 
i = 1, 2, . . . n, and condition (fl2l) holds. Then 

Sj- = span ^c T n u n (x)\ x=Wj : c n G jV(V n (D)) j , (14) 

where V^(D) is the embedded data matrix of Y. Furthermore, dj = D — rank (VP n (vjj)), 
j = 1,2, ...n, where P n (x) = \p n i( x ) Pn2{ x ) ■ ■ ■Pnh I (n)( x )] T £ IR lx/lI («) is a row vector of 
independent polynomials in I n found using the singular vectors corresponding to the zero 
singular values of V n (D), and VP n = [V T p nl (x) V T Pn2 (x) . . . V T p nhl(n) (x)} G R D * h ^\ 

As a final step, we need a procedure to select a point Wj, j = 1, 2, . . . n for each subspace. 
Without loss of generality let j = n. One can show that the first point w n , where w n G S n 
and w n $ Si, i = 1, 2, . . . n — 1, is given by 

w n = argmin yey:VPn(j/) _, P n ( 2/ )(V T P„( 2 /)VP n (y)) t P n T (y). (15) 

Furthermore, a basis for 5 n can be found by applying PCA to VP n {w n ). To find the rest of 
the points Wi G Si, i = 1, 2, . . . n — 1, we can use the polynomial division as proposed by the 
next theorem. 

Theorem 3.2 [I]. Let Y = {yi}f =1 be a set of points sampled from Z4 = SiUS^U- • -US n , 
where Si is a subspace of unknown dimension di, i = 1,2, ...n, and suppose (fl2l holds. 
Furthermore, let a point w„ G and be given. Then the set UI=i &i * s characterized 
by the set of homogenous polynomials 

{cl_ x p n ^{x) : V n (D)R n (b n )c n ^ = 0, V6 n G c^-i G £ M «-i( D )| , 

where R n (b n ) G K M «( £) ) xM n-i( D ) i s the matrix of coefficients of c n _i when (6^x)(c^_ 1 //„_i(x)) = 
c^u n {x) is rearranged to be of the form R n {b n )c n _i = c n . 

Once the homogenous polynomials {c^_ 1 i/ n _i(a;)} given in the previous theorem are ob- 
tained, the same procedure can be repeated to find w n -\ and the homogenous polynomials 
characterizing \J™Z\ Si. 
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3.2. Subspace Estimation Using a Voting Scheme 

The Basic GPCA framework works well in the absence of noise. In practice, however, 
noise is always present and efficient statistical methods need to be used in conjunction with 
Basic GPCA. In this section, we present one such statistical method where a voting scheme 
is combined with the Basic GPCA. Here we assume that the number of the subspaces and 
their dimensions are known. For a more complete treatment of the subject see [3]. 

Let Y = {y{\f =1 G M° be the set of data points sampled from the set Z4 — Si U 
S2 U • • • U S n , where Sj, j = 1,2, . . .n, is a subspace of dimension dj and co-dimension 
Cj = D — dj. From the discussion in the previous section we know that the homogenous 
component of degree n of the vanishing ideal I{Zj) denoted by I n uniquely defines 1(Z^). 
Moreover, we mentioned that dim(J ra ) = hi(n), where h\(n) is the Hilbert function. Let 
P = {px(x), P2(x), ... Phi{n)( x )} De the set of basis of I n , which can be found by selecting the 
h\{n) smallest singular values of V n (D), where V n (D) is the embedded data matrix. Suppose 
we choose a point y x G Y. Let us define VP B (|/i) = [V T pi(?/i) V T p 2 {yi) ■ ■ ■ V T p hl (n)(yi)] ■ 
In the noise-free case rank(V-P;B(2/i)) = Cj. 

However, in the case where the data is corrupted by noise, a more efficient method for 
computing the bases is desired. Suppose the co-dimension of the subspaces take m distinct 
values c[, c' 2 , . . . , c' m . In the voting scheme, since we don't know which subspace y\ belongs 
to and we would like to leave our options open, the base for the orthogonal complement 
of subspaces of all possible dimensions c-, % = 1,2, . . . m, are calculated by choosing the 
principal components of VPb(z/i)- This results in m matrices B{ G M Dx % % = 1,2, ...m 
each of which is a candidate base for S^, i — 1, 2, . . . n. 

The idea of the voting scheme is to count the number of repetitions of each candidate 
base for all points in the data set yi, i — 1, 2, . . . N. At the end, the n bases with the most 
votes are chosen to be the bases of S^~, % — 1, 2, . . . n, and each point is assigned to the closest 
subspace. In our criterion for counting the repetition of the bases, two bases are considered 
to be the same if the angle between the two subspaces spanned by them is less than r, where 
t > is a tolerance parameter. 

4. Segmentation of Facial Expressions 

In this section, we use the techniques presented in Sections [2] and [3] to segment the facial 
expressions in a given set of images. More specifically, given a set of images of a person 
with two different facial expressions (e.g. neutral and happy), we try to segment the images 
based on their facial expression. We should mention that the author in [8] uses the idea 
of point clustering and PCA to segment images with different facial expressions. In this 
paper, however, we would like to show that if the manifold of faces is "unfolded" (e.g. using 
a Maximum Variance Unfolding technique), different facial expressions reside on different 
subspaces. 

In our experiment, for each human subject about 30 images of their face were taken, 
where the subject starts by a neutral expression, transitions to a happy expression, and goes 
back to the normal expression, where each part contains about 10 images. An example set 
of images is given in Figure H~T1 The images were taken in a sequence, each 200 x 240 pixels, 
and in total there were 4 subjects. 

Each image can be considered as a vector of dimension 48000, by stacking up all the 
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Figure 4.1: A sequence of pictures, where the subject starts with a neutral expression, 
smiles, and resumes the neutral expression. 

columns in the image matrix. In this way, each image is a point in a 48000 dimension space. 
In order to segment the images, first, the dimension is reduced to D = 5 using the MVU 
procedure presented in Section [2J where k — 4, i.e. when forming the weighted graph G, 
the 4-nearest neighbors are connected by an edge. Next, the resulting points in the D = 5 
dimensional ambient space are used to identify 2 subspaces of dimension d = 1,2,3,4, where 
the in the GPCA voting algorithm two subspace are considered to be the same if the angle 
between the two is less than r = 0.4 [9j. The segmentation error for each case is given in 
Tabled 

In order to visualize the subspace identification, the segmentation for the case D = 2, 
d = 1 is given in Figure 14.21 
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Table 1: Segmentation Results for D = 5 



Subject 


Number of Images 


d = 


Segmentation Error 
1 d=2 d=3 d 


= 4 


#1 


29 


3 


2 


2 


3 


#2 


31 


13 


13 


3 


7 


#3 


31 


6 


15 


2 


4 


#4 


32 


13 


15 


1 


1 




Figure 4.2: Facial expression segmentation with D = 2 and d = 1. The categorization 
error is 6/30. The solid and dashed lines are the subspaces corresponding to the neutral and 
happy expressions, respectively. The points associated with the solid line and the dashed 
line are represented by "+" and "x", respectively. The points with "o" are those that are 
associated with the wrong expression. 
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